AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Neural Information Processing SystemsFeb-16-2026, 01:34:04 GMT

UDA

Cleaning missing values: The human-generated questions may be unanswerable. Thus, we remove the Q&A items that lack available answers. Additionally, documents lacking any valid Q&A pairs are also removed.

artificial intelligence, dataset, natural language, (17 more...)

Country:

North America > United States (0.30)
Asia > China > Shanghai > Shanghai (0.06)
Asia > Singapore (0.05)

Industry: Law (0.96)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Neural Information Processing SystemsFeb-16-2026, 01:34:02 GMT

UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis

Models (LLMs) in collaborating with external data, yet significant challenges exist in real-world scenarios. In areas such as academic literature and finance question answering, data are often found in raw text and tables in HTML or PDF formats, which can be lengthy and highly unstructured.

large language model, machine learning, natural language, (18 more...)

Country:

Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Indonesia > Bali (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.46)
Banking & Finance (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Javadi, Saeedeh, Mirabi, Sara, Gangar, Manan, Ofoghi, Bahadorreza

When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare

arXiv.org Artificial IntelligenceNov-11-2025

In high-stakes information domains such as healthcare, where large language models (LLMs) can produce hallucinations or misinformation, retrieval-augmented generation (RAG) has been proposed as a mitigation strategy, grounding model outputs in external, domain-specific documents. Yet, this approach can introduce errors when source documents contain outdated or contradictory information. This work investigates the performance of five LLMs in generating RAG-based responses to medicine-related queries. Our contributions are three-fold: i) the creation of a benchmark dataset using consumer medicine information documents from the Australian Therapeutic Goods Administration (TGA), where headings are repurposed as natural language questions, ii) the retrieval of PubMed abstracts using TGA headings, stratified across multiple publication years, to enable controlled temporal evaluation of outdated evidence, and iii) a comparative analysis of the frequency and impact of outdated or contradictory content on model-generated responses, assessing how LLMs integrate and reconcile temporally inconsistent information. Our findings show that contradictions between highly similar abstracts do, in fact, degrade performance, leading to inconsistencies and reduced factual accuracy in model answers. These results highlight that retrieval similarity alone is insufficient for reliable medical RAG and underscore the need for contradiction-aware filtering strategies to ensure trustworthy responses in high-stakes domains.

information, large language model, machine learning, (17 more...)

2511.06668

Country: Oceania > Australia (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Cook, Thomas, Osuagwu, Richard, Tsatiashvili, Liman, Vrynsia, Vrynsia, Ghosal, Koustav, Masoud, Maraim, Mattivi, Riccardo

Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation

arXiv.org Artificial IntelligenceOct-30-2025

Retrieval-Augmented Generation (RAG) systems often face limitations in specialized domains such as fintech, where domain-specific ontologies, dense terminology, and acronyms complicate effective retrieval and synthesis. This paper introduces an agentic RAG architecture designed to address these challenges through a modular pipeline of specialized agents. The proposed system supports intelligent query reformulation, iterative sub-query decomposition guided by keyphrase extraction, contextual acronym resolution, and cross-encoder-based context re-ranking. We evaluate our approach against a standard RAG baseline using a curated dataset of 85 question--answer--reference triples derived from an enterprise fintech knowledge base. Experimental results demonstrate that the agentic RAG system outperforms the baseline in retrieval precision and relevance, albeit with increased latency. These findings suggest that structured, multi-agent methodologies offer a promising direction for enhancing retrieval robustness in complex, domain-specific settings.

large language model, machine learning, natural language, (19 more...)

2510.25518

Country: Europe > Ireland (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Maskey, Utsav, Dras, Mark, Naseem, Usman

Steering Over-refusals Towards Safety in Retrieval Augmented Generation

arXiv.org Artificial IntelligenceOct-14-2025

Safety alignment in large language models (LLMs) induces over-refusals -- where LLMs decline benign requests due to aggressive safety filters. We analyze this phenomenon in retrieval-augmented generation (RAG), where both the query intent and retrieved context properties influence refusal behavior. We construct RagRefuse, a domain-stratified benchmark spanning medical, chemical, and open domains, pairing benign and harmful queries with controlled context contamination patterns and sizes. Our analysis shows that context arrangement / contamination, domain of query and context, and harmful-text density trigger refusals even on benign queries, with effects depending on model-specific alignment choices. To mitigate over-refusals, we introduce \textsc{SafeRAG-Steering}, a model-centric embedding intervention that steers the embedding regions towards the confirmed safe, non-refusing output regions at inference time. This reduces over-refusals in contaminated RAG pipelines while preserving legitimate refusals.

large language model, machine learning, natural language, (18 more...)

2510.10452

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.70)
Health & Medicine > Therapeutic Area (0.68)
Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Neural Information Processing SystemsOct-10-2025, 06:57:44 GMT

UDA

dataset, source dataset, uda dataset, (15 more...)

Country:

North America > United States (0.30)
Asia > China > Shanghai > Shanghai (0.06)
Asia > Singapore (0.05)

Industry: Law (0.96)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Neural Information Processing SystemsOct-10-2025, 06:57:41 GMT

UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis

arxiv preprint arxiv, computational linguistic, dataset, (13 more...)

Country:

Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Indonesia > Bali (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.46)
Banking & Finance (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-12-2025

ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation

Maragheh, Reza Yousefi, Vadla, Pratheek, Gupta, Priyank, Zhao, Kai, Inan, Aysenur, Yao, Kehui, Xu, Jianpeng, Kanumala, Praveen, Cho, Jason, Kumar, Sushant

Retrieval-Augmented Generation (RAG) has shown promise in enhancing recommendation systems by incorporating external context into large language model prompts. However, existing RAG-based approaches often rely on static retrieval heuristics and fail to capture nuanced user preferences in dynamic recommendation scenarios. In this work, we introduce ARAG, an Agentic Retrieval-Augmented Generation framework for Personalized Recommendation, which integrates a multi-agent collaboration mechanism into the RAG pipeline. To better understand the long-term and session behavior of the user, ARAG leverages four specialized LLM-based agents: a User Understanding Agent that summarizes user preferences from long-term and session contexts, a Natural Language Inference (NLI) Agent that evaluates semantic alignment between candidate items retrieved by RAG and inferred intent, a context summary agent that summarizes the findings of NLI agent, and an Item Ranker Agent that generates a ranked list of recommendations based on contextual fit. We evaluate ARAG accross three datasets. Experimental results demonstrate that ARAG significantly outperforms standard RAG and recency-based baselines, achieving up to 42.1% improvement in NDCG@5 and 35.5% in Hit@5. We also, conduct an ablation study to analyse the effect by different components of ARAG. Our findings highlight the effectiveness of integrating agentic reasoning into retrieval-augmented recommendation and provide new directions for LLM-based personalization.

large language model, machine learning, natural language, (13 more...)

2506.21931

Country: North America > United States > California (0.17)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mathew, Julia Ann, He, Suining

Lessons from A Large Language Model-based Outdoor Trail Recommendation Chatbot with Retrieval Augmented Generation

arXiv.org Artificial IntelligenceAug-11-2025

The increasing popularity of outdoor recreational activities (such as hiking and biking) has boosted the demand for a conversational AI system to provide informative and personalized suggestion on outdoor trails. Challenges arise in response to (1) how to provide accurate outdoor trail information via conversational AI; and (2) how to enable usable and efficient recommendation services. To address above, this paper discusses the preliminary and practical lessons learned from developing Judy, an outdoor trail recommendation chatbot based on the large language model (LLM) with retrieval augmented generation (RAG). To gain concrete system insights, we have performed case studies with the outdoor trails in Connecticut (CT), US. We have conducted web-based data collection, outdoor trail data management, and LLM model performance studies on the RAG-based recommendation. Our experimental results have demonstrated the accuracy, effectiveness, and usability of Judy in recommending outdoor trails based on the LLM with RAG.

judy, large language model, machine learning, (16 more...)

2508.05652

Country: North America > United States > Connecticut (0.26)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)